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BAYESIAN POISSON PROCESS PARTITION CALCULUS WITH 
AN APPLICATION TO BAYESIAN LEVY MOVING AVERAGES 

By Lancelot F. James^ 

The Hong Kong University of Science and Technology 

This article develops, and describes how to use, results concerning 
disintegrations of Poisson random measures. These results are fash- 
ioned as simple tools that can be tailor-made to address inferential 
questions arising in a wide range of Bayesian nonparametric and spa- 
tial statistical models. The Poisson disintegration method is based on 
the formal statement of two results concerning a Laplace functional 
change of measure and a Poisson Palm/Fubini calculus in terms of 
random partitions of the integers {!,..., n}. The techniques are anal- 
ogous to, but much more general than, techniques for the Dirichlet 
process and weighted gamma process developed in [Ann. Statist. 12 
(1984) 351-357] and [Ann. Inst. Statist. Math. 41 (1989) 227-245]. In 
order to illustrate the flexibility of the approach, large classes of ran- 
dom probability measures and random hazards or intensities which 
can be expressed as functionals of Poisson random measures are de- 
scribed. We describe a unified posterior analysis of classes of discrete 
random probability which identifies and exploits features common to 
all these models. The analysis circumvents many of the difficult issues 
involved in Bayesian nonparametric calculus, including a combinato- 
rial component. This allows one to focus on the unique features of 
each process which are characterized via real valued functions h. The 
applicability of the technique is further illustrated by obtaining ex- 
plicit posterior expressions for Levy-Cox moving average processes 
within the general setting of multiplicative intensity models. In addi- 
tion, novel computational procedures, similar to efficient procedures 
developed for the Dirichlet process, are briefly discussed for these 
models. 

1. Introduction. Let N denote a Poisson random measure on an ar- 
bitrary Polish space W characterized by its nonatomic sigma-finite mean 
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intensity, 

E[N{dw)]=ij{dw). 

That is to say, is a discrete random measure such that, for disjoint sets 
A and B, N{A) is independent of N{B). Additionally, for each bounded 
set B, N{B) is a Poisson random variable with finite mean E[N{B)] = 
v{B). Following Daley and Vere-Jones [7], takes its values in the space 
of boundedly finite measures, say equipped with an appropriate sigma- 
field B{M). Denote the law of as V{dN\v). Additionally, BM{W) denotes 
the collection of Borel measurable functions of bounded support on W. The 
class of nonnegative functions in BM{yV) is denoted as i?M_^(>V). The law 
of N is also uniquely characterized by its Laplace functional given by 

(1) CNifW)= f e~^^f'>V{dN\u)=exp( - [ {1 - e~f'^'"^)i^{dw)) 

Jm \ Jw J 

for each / G BM+{yV), where N{f) = /yy f{w)N{dw). Note that the Laplace 
functional is well defined for all positive functions /. For additional infor- 
mation, see [22], Chapter 12. The Laplace functional, (1), will play a fun- 
damental role in our analysis. An essential part of our presentation involves 
extensions of the following well-known disintegration for a joint measure of 
a point W GW and N: 

(2) N{dW)V{dN\u) = V{dN\u, W)E[N{dW)] = V{dN\u, W)u{dW), 

where E[N{dW)] = Jj^N{dW)V{dN\i^) and V{dN\i^,W) is a conditional 
distribution of N, given a point W, and coincides with the conditional law 
of the random measure 

N + 5w, 

where N is T'{dN\i') and is a fixed point. The result in (2) is equivalent 
to the Fubini theorem 



(3) 



M 



g{w,N)N{dw) 
w 



V{dN\u) 



g{w,N)V{dN\u,w) 
M 



v{dw), 



for each measurable positive or integrable function g. Additionally, from the 
definition of V{dN\u,W), the following change of measure formula holds: 



(4) 



w 



g{w,N)V{dN\u,w 

M 



v{dw) 

g{w,N + 6u,)V{dN\ij)ij{dw) 



IWJM 

Within the framework of Palm calculus, the disintegration (2) is well known 
and may be found in [21] or [7], where VidNlu, W) is an example of a Palm 
distribution. The representation (2) has been used extensively in a variety 
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of important applications in probability; see, for instance, [35]. However, its 
use has been absent from the Bayesian nonparametrics literature. Note that 
since N is not a random probability measure, V{dN\i', W) does not have the 
interpretation of a posterior distribution. However, the use of (2) is already 
enough to derive the posterior distribution of a variety of proper random 
probability measures when n= 1. 

Random measures based on Poisson processes play an important role in 
spatial statistical analysis and Bayesian nonparametric statistics. In this 
work we will introduce a methodology we call a Poisson process partition 
calculus that provides a unified treatment of the otherwise formidable pos- 
terior analysis of such random measures. The idea appears in the unpub- 
lished manuscript of James [18], which discusses a variety of applications. 
Here, we will present a streamlined discussion which focuses specifically on 
methodology to deduce key properties of general classes of random proba- 
bility measures and random intensities, analogous to those which make the 
Dirichlet process (see [13]) an attractive process for Bayesian non- and semi- 
parametric analysis. The methodology consists of two components which will 
be described in more detail in Section 2. The first component is a Laplace 
functional/exponential change of measure formula for Poisson random mea- 
sures, which can be seen as a form of functional exponential tilting or Esscher 
transform. The second is an extension of (2) in terms of partitions of the 
integers {1, . . . ,n}. One function of this extension is to allow one to bypass 
otherwise complex combinatorial arguments. In order to show explicitly the 
flexibility of our methods, we describe large classes of random probability 
measures in Section 1.1 which can be expressed as functionals of Poisson 
random measures. Additionally, in Section 1.1.1 we describe the structures 
of interest that are analogous to those for the Dirichlet process. Section 2 
describes the elements of the Poisson process partition calculus. Section 3 
discusses how to use the results in Section 2 to obtain the posterior analysis 
of the class of models described in Section 1.1. Section 4 presents a more 
explicit posterior analysis of a class of Levy moving averages or hazard rates 
subject to a multiplicative intensity model. We also show, briefly, how this 
analysis leads to the development of computational procedures analogous 
to those used in Dirichlet process mixture models. Section 5 presents the 
formal details of the proof of Proposition 2.2. 

1.1. General discrete random probability measures and related concepts. 
Let h denote a strictly positive jointly measurable function on W x A^. One 
may define a general class of random probability measures, P, on W as 
follows: 



(5) 
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where h is chosen such that Jy^ h{w, N)N{dw) = 1. The precise conditions 
on h may also place restrictions on u. Note, however, that countable additiv- 
ity of P automatically follows from the additivity property of integrals with 
respect to A^. Formally, we will consider random elements Wi, . . . ,Wn\P 
which are i.i.d. with distribution P and P is defined in (5) with law, say 
V{dP\i'), determined by a Poisson random measure N with law 'P{dN\i'). 
This gives a decomposition of the joint distribution of (W,P). We are in- 
terested in identifying explicitly the disintegration of this joint distribution 
in terms of the posterior distribution of -P|W, say 7r((iP|W), and the ex- 
changeable marginal distribution of W given by 



(6) 



l[P{dWi) 



V{dP\u)=TT{dP\W) 



M 



X{P{dW.i 



V{dP\u) 



In principle, the most difficult task is, of course, to obtain a clear expression 
for the posterior distribution 7r((iP|W). This can be formidable for n = 1 
and due to obvious nonconjugacy, and other issues to be discussed below, 
becomes more difficult for general n. However, explicit expressions for the 
marginal distribution and the posterior distribution are naturally linked. 
Hence, it is instructive to examine more closely the marginal distribution. 
By de Finetti's theorem, it is evident that the structure 



(7) 



P(dW|i/) := 



M 



\{P{dWi) 



i=l 



V{dP\v) 



is exchangeable. It is a general analogue of the Blackwell and MacQueen [5] 
Polya urn distribution. Moreover, this distribution is such that the random 
vector W possibly consists of ties and hence, the posterior distribution itself, 
7r(dP|W), also depends on ties. This suggests, as is natural for exchange- 
able structures (see the discussion in [25]), that the characterization of these 
quantities can involve a substantial combinatorial component. Here we dis- 
cuss decompositions of (7) in terms of random partitions of the integers 
induced by these ties. 



1.1.1. Random partitions, EPPF, marginal distributions. It is clear that 
there is a one-to-one correspondence between W and (W*,p), where, using 
notation similar to Lo [30], W* = (M^i , . . . , W^^^pp denotes the distinct val- 
ues of W and p = {Ci, . . . , C„(p)} stands for a partition of {1, . . . , n} of size 
n(p) < n recording which observations are equal. The number of elements 
in the jth cell, Cj := {i:Wi = WJ}, of the partition is indicated by Cj, for 

j = I, . . . ,n(p), so that J2'j=i = When it is necessary to emphasize a 
further dependence on n, we will also use the notation ej^n '■= It follows 
that the marginal distribution of W can be expressed in terms of a condi- 
tional distribution of W|p, which is the same as a conditional distribution 
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of the unique values W*|p and the marginal distribution of p. The marginal 
distribution of p, denoted as ■7r(p) or p(ei, . . . , e„(p)), is an exchangeable par- 
tition probability function (EPPF), that is, a probability distribution on p 
which is exchangeable in its arguments and only depends on the size of each 
cell. The best known case of an EPPF is the variant of the Ewens sampling 
formula (ESF) associated with the Dirichlet process with total mass 9 > 0, 
given as 

Qn{p)Y(Q) "'■P) 

(8) Poiei,...,enip)) = Y7^—rl[r{ej), 

y I j=i 

which was derived by Ewens [12] and Antoniak [3]. The EPPF can be in- 
terpreted as the distribution of the configuration of ties (clusters) among 
the W. To understand this relationship further, note that, analogous to the 
case of the Dirichlet process, one can define the following probabilities rel- 
evant to (7). Suppose that Wn+i is a newly observed variable. Then the 
probability that Wn+i is distinct from the values W, given p, is 

(9) ^{Wn+i IS new |p) = go,n " 



p(ei,...,e„(p)) 

and for j = 1, . . . ,n(p), the probability that Wn+i = W*, given p, is 

p(ei,...,ej + l,...,e„(p)) 



(10) FiWn+i = W;\p)=qj, 



p(ei,...,e„(p)) 



It is known that, for the case of (8), one has go,n = 0/(6 + n) and qj^n = 
ej /{9 + n), which are the probabilities associated with the Chinese restaurant 
process (see [38], page 60) and the Blackwell-MacQueen prediction rule. In 
principle, one can use the probabilities in (9) and (10) to generate samples 
from p, according to the EPPF, via a generalized Chinese restaurant process. 
See [15] for a discussion. However, we point out that, in general, unlike the 
case of the Dirichlet process, these probabilities are not the probabilities, say 
P(VFn+i = Wj*|W) for j = 1, . . . ,n(p), which correspond to the appropriate 
prediction rule of Wn+i|W. Rather, the following relationship holds: for 
j = l,...,n(p), 

¥{Wn+l = W*\p) = [ F{Wn+l = W*\W)TT{dW*\p), 

where 7r(dW*[p) denotes the distribution of W|p in terms of the unique 
values W*. 



Remark 1. The general EPPF concept is described in [36, 37, 38, 39], 
where a variety of applications are discussed. The notation ^p will be used 
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to denote the sum over all possible partitions of the integers {l,...,n}. 
A general discussion of the marginal structures ¥(dW\i'), such as that pre- 
sented here, does not seem available. In the language of the theory of random 
measures, P((iW|z/) is also seen to be the nth moment measure of P. That 
is, one can use it to obtain the integer moments of P and related quantities. 

2. Poisson process partition calculus. So far we have pinpointed the type 
of structures we would like to obtain. However, what is missing is a system- 
atic and easy mechanism to get at explicit expressions for these quantities. 
The idea of this paper is to focus on the utilization of (partition based) 
disintegration results related to the joint measure of (W, A'^) given by 



(11) 



Li=l 



V{dN\u)=V{dN\u,W) 



M 



V{dN\u). 



The quantity (11) is not a proper distribution. However, it is this general 
form on the left-hand side which appears, explicitly or in augmented form, 
in all the models that will be discussed. The right-hand side, similar to that 
of (6), consists of a conditional distribution of A|W, V{dN\i',W) and a 
sigma-finite marginal measure of N, 



(12) 



M{dW\i') 



M 



\{N{dWi, 



V{dN\v) 



which behaves in many respects like an exchangeable urn distribution and, 
importantly, can be expressed in terms of (W*,p). These quantities are 
direct extensions of (2). The main purpose of this section is to describe two 
results concerning the Poisson process and the disintegration of (11) which 
are fashioned as simple tools that can be tailor-made to address inferential 
questions arising in a wide range of Bayesian nonparametric models. 



2.1. Basic tools. First an exponential change of measure or disintegra- 
tion formulae based on Laplace functionals is given below. This is a simple 
functional extension of an analogous result for Levy processes on TZ or more 
generally, TZ'^, which may be found in [27], Proposition 2.1.3. Such an oper- 
ation is commonly called exponential tilting. 

Proposition 2.1. For each f e 5M+(yV) and each g on {M,B{M)), 

f g{N)e~^^f^V{dN\u)=£N{fW) f g{N)V{dN\e~f u), 

JM JM 

where V{dN\e~^ v) is the law of a Poisson process with intensity e~^^'^^u{dw). 
In other words, the following absolute continuity result holds: e~^'^^'^V{dN\v) = 
^^N{f\^)P{dN\e~^ v). The result extends to any nonnegative measurable f 
such that /i/y(l — e~f^'^^)u{dw) < oo. 
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Proof. By the unicity of Laplace functionals for random measures 
on W, it suffices to clieck this result for the case g{N) = e~^^^^ for h G 
5M+(W). It follows that 

e~^(f+^^V{dN\iy)=CN{fW) f e-^^^^Vf{dN), 
M Jm 

where, for the time being, Vf denotes some law on A^. Simple algebra shows 

that 

and, hence, Vf{dN) = V{dN\e~f v). The extension holds by the same argu- 
ment, since Cj^{f\v) > 0. □ 

Now, while indeed it is possible to use (2) repeatedly to analyze many of 
the models discussed in Section 1.1, such an analysis does not circumvent the 
need for what might be formidable combinatorial analysis. One may note, 
for instance, the nontrivial arguments used by Antoniak [3] to derive (8). 
With this in mind, the next result, in Proposition 2.2, gives a partition-based 
representation of (11) which serves to significantly simplify such derivations 
for more general models. We will delay a proof of Proposition 2.2 until 
Section 5. First, we formally identify the law V{dN\v,^) appearing in (11) 
as a conditional distribution of N , given the points W, which is equivalent 
to the law of the random measure 

n(p) 

(13) K = N+Y,6wp 

i=i 

where N is V{dN\v) independent of the points W. Note, by definition, 
for any measurable function g on W x A^, that V{dN\v,^) satisfies the 
following change of variable, as in the case for n = l: 

(14) J^g{W,N)VidN\i^,W) = J^g(w,N +J2 W*\v{dN\i^). 



Using (14), it follows that the conditional Laplace functional of N with 
respect to 7^((iA^|z^, W) is 

r r"(p) ] . 

/ e-^(^)p((iiV|i.,W)= TTe-^(^/) / e-^'^MdNli^) 
Jm ~_i Jm 



IM 

(15) ^^^^ 

= /:^(/|.)ne-/(^;). 

We now present the formal partition based disintegration of (11). 
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Proposition 2.2. Suppose that (W, A^) are measurable elements in the 
space X A4 having the joint measure in (11), where N is a Poisson 
random measure with sigma- finite nonatomic mean measure v. Then the 
following disintegration holds: 



i=l 



n(p) 

V{dN\u)=V{dN\v,MV) J] v{dW*), 



where V{dN\u,^) corresponds to the law of N determined by (15) and is 
representahle in distribution as (13). The moment measure is expressible via 
conditional moment measures as 



n(p) n 

M{dW\u) = Y[ iy{dW*) = u{dWi) Y[ 

j=l 1=2 



"(pi-i) 



where n(pj_i) is the size of the partition Pi_i of {1, . . . ,i — 1} encoding the 
ties between Wi , . . . , Wi_i . 

One can combine Proposition 2.1 and Proposition 2.2, yielding the fol- 
lowing useful result which will be used in Section 4. 



Proposition 2.3. Suppose that (W,A^) are measurable elements in the 
space W" X M, where N is a Poisson random measure with sigma-finite 
nonatomic mean measure v. Then for each nonnegative measurable f such 
that fy^{l- e-f^'"'^)v{dw) < oo, the following disintegration holds: 



Y[N{dWi) 



-^(f)ridN\u) 



n(p) 



CNifW)VidN\e-fi^,W) J] e-^(^/V((iW;). 



M((iW|e i v) = Yl^'^i e ^^^i ^v{dWj) is the nth moment measure of a Pois- 
son random measure with intensity e~^^'^^v{dw). 



Proof. The proof of this result follows by first applying Proposition 2.1 
to get 



n N{dWi) 



i=l 



-^^f)V{dN\v)=CN{f\v) 



X{N{dWi) 



V{dN\e'fv) 



Conclude the result by applying Proposition 2.2 with e ^^'^^v{dw) in place 
of v{dw). □ 



POISSON PROCESS PARTITION CALCULUS 



9 



3. Formal Bayesian methodology. We now describe how to use the re- 
sults in Section 2 to obtain desired results for models such as (6). First 
define 



V'n(W) 



(16) 



M 



M 



n(p) 

n [mh^)r 

-n(p) 

n [Kw*.KY 

- i=i 



V{dN\v,W) 



V{dN\v). 



Then an application of Proposition 2.2 yields the following result. 



Theorem 3.1. Let P denote a random probability defined as in (5), 
where N is a Poisson random measure with intensity v. Let W = {Wi,W2, ■ ■ ■ , Wn) 
denote a vector of random elements on a Polish space W such that Wi, . . . , W^l-P 
are i.i.d. with distribution P. Then the following results hold: 

(i) The posterior distribution o/ A^|W, Tr{dN\i',W), corresponds to the 
conditional law of the random measure 



(17) 



n(p) 



where now TT*{dN\W) = [il^n{'W)]~^V{dN\u)ll]=i [h{W*,N*)p is the con- 
ditional law of N , in (17), given W. 

(ii) The posterior distribution of PjW is equivalent to the conditional 
distribution of the random probability measure 

n{p) 

P*^{dw)=h{w,K)N:{dw)=h{w,K)N{dw) + h{W;,N:)5w;{dw), 

where the law of N\W is 7r*(d7V|W) 

(iii) The joint exchangeable marginal distribution o/W is given by¥(c[W\i') 
'>pn(^)Ytj=i i^{dWj). Additionally, the EPPF derived from the marginal 
distribution o/W is expressible as 



(18) 



P(ei,---,e„(p)) 



n(p) 

V'n(w) Y[ v{dw*) 



Proof. The key point to note is that, since P is a functional of N , 
results for the joint distribution of (W, P) follow from the corresponding 
joint distribution of (W,A^). From (6), the joint distribution of (W,A^) 
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is expressible as [Ylf=ih{W^, N)][ll'l^^ N{dWi)]V{dN\u). Applying Proposi- 
tion 2.2, along with the identity Ili=iKWi,N) = '[l]=i [HWj,N)p , the 
joint distribution of (W, N) can be expressed as 



(19) 



n(p) 



l[[hiw;,N)r 



n(p) 



V{dN\i^,W) Y[ u{dW. 



3 



One now only needs to apply simple Bayes rule to obtain an expression in 
terms of the posterior distribution of A^|W and the marginal distribution 
of W. Formally, to obtain the marginal distribution of W, one integrates 
out N in (19), yielding the form of P((iW|z^) in (iii). The expression in (18) 
is then evident. Now, since i/'n(W) > 0, it follows that the posterior distri- 
bution of A^|W is 7r(diV|W) = ['0n(W)]~^[n"S^ [h{W*,N)p]V{dN\i^,W). 
Statement (i) now follows by the change of measure formula (14). That is, 
the posterior Laplace function of A^|W is 



/ e-^(-^)7r(diV|W) = / e-^(-^)7r*(diV|W 
Jm Jm 



1 "(p) 

J|e-/(M^;). 

- i=i 



Statement (ii) follows from the fact that P{dw) = h{w, N)N{dw) and the 
representations of the posterior distribution of A^|W in statement (i). □ 

Remark 2. Statement (ii) describes the posterior distribution of P|W 
via the distribution of P* determined by Tr*{dN\'W). As one application, 
the prediction rule of W„+i|W can be readily computed as 

F{dWn+i\W)= [ P:{dWn+i)7r*{dN\W). 
Jm 

3.1. Discrete random probability measures defined by completely random 
measures. The random probability measures defined in (5) are actually 
a bit different than the random probability measures commonly used in 
Bayesian nonparametrics. In particular, as we shall show, the class P con- 
tains augmented forms of, say, the Dirichlet process or Doksum's [8] neu- 
tral to the right processes. In Bayesian nonparametrics many random prob- 
ability measures are actually functionals of completely random measures 
(see [23, 26]), say, fi defined over a Polish space y. The class of completely 
random measures contains, for instance, the gamma process and the random 
hazard processes discussed in [14]. Completely random measures, ignoring 
fixed points of discontinuity, are representable in a distributional sense as 
functionals of Poisson random measures. We now describe this construction. 
Specify W = J' x y, where J' = (0, oo). Additionally, for points w = (s,y), 
N{ds,dy) denotes a Poisson random measure with mean intensity 

E[N{ds, dy)] = u{ds, dy) = p{ds\y)r]{dy) . 
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Furthermore, it is assumed that p and ry are selected such that, for each 
bounded set B va.y, 



(20) / / VD.m.{s^\)p{ds\y)rj[dy) < CO. 
Jb Jj 

Now define a random measure ^ on 3^ such that it may be represented in a 
distributional sense as 

(21) p{dy)= [ sN{ds,dy). 

Following Daley and Vere- Jones [7], the condition (20) guarantees that ^ is 
in the space of boundedly finite measures M. equipped with an appropriate 
sigma-field, B{M.). If p does not depend on y, then n is said to be homoge- 
neous. Furthermore, if 3^ = (0, oo), then jj, is sometimes called a subordinator. 
That is to say, a nonnegative Levy process with stationary increments. Sim- 
ilar to the definition of P in (5), one can define a general class of discrete 
random probability measures on y as 

(22) Pf,{dy) = q{y,n)fi{dy) = q{y,p) sN{ds,dy), 

J J 

where g is a strictly positive measurable function such that is a well- 
defined random probability measure. Note that the second representation 
in (22) reveals, via a natural augmentation, a class of random probability 
measures on J xy defined as 

(23) P^{ds,dy)=q{y,p)sN{ds,dy). 

That is to say, Pf^{ds,dy) defined in (23) is a special case of (5) with the 
choice of h{s,y,N) = sq{y,p). 

Now set Wi = {Ji,Yi) for i = 1, . . . ,n points in ^ x 3^ and denote the 
unique values as W* = (Jj,n,^*) for j = 1, . . . ,n(p). Additionally, define a 
random measure 

fJ-nidy) = / sN*{dy) = p{dy) + ^ Jj.n^y* {dy). 
■Jo i=i 

Noting the form in (23), it follows that for W = (J, Y), 

r'^(p) 



V'n,(J,Y) 



L j=l 



^n(J,Y), 



where 0„(J, Y) = / 
JM 



n(p) 



V{dN\u). 



Additionally, let s = (si, . . . , s„) and {si^n, ■ ■ ■ , Sn{p),n) denote the arguments 
of J = (Ji, . . . , Jn) and the collection (Jj^n), respectively. These facts lead to 
the following result. 
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Theorem 3.2. Let denote a random probability defined as in (22), 
where N is a Poisson random measure onW = J , with mean intensity 
v{ds, dy) = p{ds\y)rj{dy) . Let Y = (Yi, 12, • ■ • > ^n) denote a vector of random 
elements on y such that Yi,...,y„|P^ are i.i.d. with distribution P^. Then 
the following results hold: 

(i) The posterior distribution o/A^|Y corresponds to the conditional law 
of the random measure N* = N + J2'j=i ^,jj,Y* ' where the conditional law of 
N in this representation, given J, Y, is 



7r*(diV|J,Y) = [(/)„(J,Y)]- 



n(p) 



VidN\iy). 



Additionally, the distribution of3\Y is P(dJ|Y, u) oc (j)n{J, Y) l\]=i JinP(.dJj,n\Y*). 
The law of p'^{dy) = sN*{ds,dy), givenY, determined by the law of N*\Y, 
corresponds to the posterior distribution of fi\Y . 

(ii) The posterior distribution of P^\Y is equivalent to the conditional 
distribution, given Y , of the random probability measure P^* {dy) = q{y, fi*^) fi*^{dy) . 

(iii) FidY\u) = [Jj^^,,M^,^)nfl^s2^p{dsj,n\Y;)]Uf^^ the 
exchangeable marginal distribution ofY. The EPPF derived from the marginal 
distribution ofY is expressible as 

„ "(p) 
(24) p(ei,...,e„(p))= / ^</'n(s,y) H s7,nP(.dsj,n\yj)v(.dy*). 

JJ"(p)xy"(p) Si J J 



Proof. First note the representation fi{dYi) = Jj JiN{dJi,dYi) for i = 
l,...,n. Augmenting the joint distribution of (Y,P^) by J yields the dis- 
tribution of (J,Y,P^). Noting that W = (J,Y), and using the identity 
HiLi Ji = riji?!^ Jjn^ posterior distribution of J, Y and, hence, that of 
fi and P^, fohows directly from Theorem 3.1. Similarly, the joint distribution 
of J, Y is given by statement (iii) of Theorem 3.1. This in turn yields the 
distributions of J|Y and Y. The distribution of follows from the fact 
that P^{dy)=JjP^{dy,ds). □ 

Remark 3. The results in Theorems 3.1 and 3.2 serve the purpose of 
exploiting the common features of many random probability measures. This 
in turn allows one to avoid otherwise cumbersome intermediate arguments 
and focus on the unique features of each process. That is to say, similar to 
parametric Bayesian results obtained via classical Bayes rule, one will often 
require a finer analysis which now, given the results in Theorems 3.1 and 3.2, 
depends on exploiting the specific features of h and z^. 
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Remark 4. If one sets p{ds\y) := p{ds) such that p{ds) = oo, and 
specifies r]{dy) to be a probabihty measure, then the choice of h{s,y,N) = 
s/T for T = /q°° Jy sN{ds, dy) = p{y) yields the homogeneous Poisson-Kingman 
random probabihty measures. This class has been discussed in varying gen- 
eralities and contexts in, for instance, [18, 24, 35, 38, 39, 40]. The Dirichlet 
process with total mass 9 arises by the choice of p{ds) = 9s~^e~^ ds. Using 
this choice, one can recover (8) from (24) or (18). More generally, using this 
choice of /i, one obtains the EPPF given by Pitman [39]. 

Remark 5. James [20] shows that Doksum's [8] neutral to the right 
processes can be obtained by the choice of h{s,y,N) = se~^^y~\ for (s,y) 
in [0, 1] X (0, oo), where Z{y—) = /q^ /q°° /|2,<-j^}[— log(l — u)]N{du, dx), where 
now p{ds\y) is a Levy measure on [0,1] and rj is modeled as a cumulative 
hazard. The work of James [20] is an example of the type of refined analysis 
mentioned in Remark 3. 

Remark 6. One may define analogues of Dirichlet process mixture mod- 
els (see [30]) by mixing P or with a known density or probability mass 
function. The posterior analysis of such models follows as a simple conse- 
quence of Theorem 3.1 or Theorem 3.2 and Fubini's theorem. In particular, 
P((iW)z^) plays the role of a mixing measure, in analogy to the Blackwell- 
MacQueen distribution. A further generalization of these types of models is 
given in [33]. However, structurally such models are more closely related to 
models we will describe in the next section. That is to say, their analysis 
does not follow directly from Theorem 3.1 or Theorem 3.2. 

4. Multiplicative intensity models and Levy Cox moving averages. Sim- 
ilar to Lo and Weng [32] (see also [10]), one can define random hazard rates 
or spatial intensities on a Polish space X as 



where k{x\y) denotes a known positive measurable kernel on a Polish space 
X xy assumed to be ry-integrable over y. Additionally, k is chosen such that, 
for a sigma-finite measure r on ^ and each bounded set B, /g k{x\y)T{dx) < 
oo for each fixed y. Under this condition one may define a random cumulative 
intensity for each bounded set B as 



The models (26) are also known as Levy-Cox moving average models as 
discussed in [41, 42]. The models (25) can be used to model intensities 



(25) 




(26) 
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of counting process models, or hazard rates of distribution functions. In 
particular, if Af = (0, oo), then one can define a random density / as 

(27) f{x\X) = e-^^^h{x) = S{x\X)X{x), 

where A(x) = X{v) dv = SyU^ kiy\v) dv]fj,{dy) is a cumulative hazard and 
5(x|A) := e~^^^^ is the survival function denoting the probability that a 
random variable Xi > x. We, of course, assume that A(oo) = oo. We will 
provide a detailed posterior analysis of the general class of Levy moving 
averages assuming a multiplicative intensity likelihood, which we now de- 
scribe. Suppose, as in [2], that, for each i = 1, . . . ,m, and fixed /x, there is an 
independent counting process with mean intensity X{x)Ui{x), where Ui{x) is 
a predictable process which is observable. We discuss some specific interpre- 
tations of this function below. Under this assumption the counting processes 
correspond to classes of multiplicative intensity models as discussed in [1]. 
Jacod [17] (see also [2, 32]) showed that the likelihood of such counting pro- 
cesses is absolutely continuous to the likelihood of Poisson process models. 
Here, for n < m, we work with the multiplicative intensity likelihood with a 
random intensity (25) which can be represented as 

n „ 

(28) L(X|//) = e-^(5^) n / HX,\Yi)^l{dYi), 



i=l 



where gm{y) = i Ix Ui{x)k{x\y)T{dx) and, hence. 



At (9m) 



X 



1=1 



X{x)T{dx). 



Note that throughout we assume that k and {Ui) are chosen such that gm is 
in BMj^{y). The model (28) suggests that there are Xi, . . . ,X„ completely 
observed points and m — n points, say Xn+i, ■ ■ ■ , Xm, which are partially 
observed. Meanwhile, Y = (li, . . . ,Yn) can be viewed as missing data. The 
multiplicative intensity likelihood captures a large variety of models which 
appear in event history analysis. For example, if ^ = (0, oo) and one sets 
Ui{x) = I^Xi>x}I{xeBi} for a random set Bt independent of Xi, then one 
can use this to model various censoring mechanisms. Specifically, setting 
Bi = [0,Di] for a random variable Di corresponds to a right censoring model. 
An extension to left truncation and right censored models is given by the 
choice Bi = (V^, -Dj], where is a random variable almost surely less than Di 
(see [2], Section III. 2). On the other hand, setting J2iLiDi{x) = 1 leads 
to the likelihood of an inhomogeneous Poisson process with mean intensity 
l{x)T{dx). Before proceeding to the posterior analysis, we first describe some 
more details about the special case of the class of random distributions 
defined by (27). 
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4.1. Random hazard rates and densities. Some specific examples of ker- 
nels k used to define hazard rates A include the Dykstra and Laud [10] 
kernel, which corresponds to k(x\y) = I^y^x}^ where it follows that 

K{t\y):= I k{x\y)dx = {t-y)Iiy<t} and 
(29) '° 

A(t) = / (t - y)I{y<:t}Kdy) 



for t > 0. This choice of k generates the family of nondecreasing hazard 
rates. Dragichi and Ramamoorthi [9] establish the consistency of this class 
of random hazard rates under wide choices of fi. If one chooses an exponential 
kernel k(x\y) = e~^^, then 

K{t\y)= / e-''ydx = y~^{l-e-y^) and A(t) = / y-^(l - e-^*)/i(dy). 



As discussed in [32], this induces hazard rates which are completely mono- 
tone. See [34] for a variation of this model. If one is unsure of the shape of 
the hazard, then one can use any of the convolution kernels that one finds in 
classical kernel based density estimation, where, for y = (m, a) € (— cjo, oo) x 
(0, oo), a fairly simple choice is the rectangular kernel k{x\m, a) = /{|x-m|<CT}- 
See [16, 32, 41, 42] for various choices of k on the real line and for spatial 
models. Notice that for a random variable T the quantity \{t) represents 
the hazard rate of T given /i, that is, 

\{t) dt = F{t<T <t + dt\T >t,^). 

Note, however, that the quantity E[X{t)] does not have the interpretation 
as a prior specification for the hazard rate. For instance, in the case of the 
stable law of index < a < 1 , one has 



E[X{t)] = / k{t\y)E[fi{dy)] = / k{t\y) 
Jy Jy 



^ -''ds 



r(l - a 



r]{dy) = oo. 



and we see that it is possible that £'[A(t)] = oo for all t. It follows that 
to appropriately evaluate the marginal hazard rate of T, one needs to first 
find the distribution of or N , given T >t. Setting Ui{x) = /j^^^}, we 
have gi{y) = S^k{x\y)dx :=K{t\y). Hence, setting fi{s,y) = gi{y)s, it fol- 
lows that S'(tlA) = e~^^^'^'^ and an application of Proposition 2.1 gives 

S{t)V{dN\u)=V{dN\e-f^v)E[S{{t)\\)i 

where 

i?[5(t|A)] = /:^(/i|.) = e-/.r(i--=^^^''^'M'^^l^)''('^^) 
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denotes the marginal survival function of T. The quantity V{dN\e~f^v) de- 
notes the law of a Poisson random measure with mean intensity e~'^^^^^y^ p{ds\y)r]{dy) 
and represents the posterior distribution of N\T >t. The marginal hazard 
rate is obtained as 



E[X{t)\T>t] 



M 



\{t)V{dN\e~f^u) 



y 



my) 



'^^'^\yhp{ds\y) 



r]{dy). 



In the stable case the marginal hazard rate becomes Jy k{t\y)[K {t\y)]°'~^ r]{dy) . 
Noting the specifications for the Dykstra and Laud kernel in (29), in the 
stable case with r]{dy) = dy, the prior predictive hazard rate and survival 
function are 



\oAt\DL) 



a 



and So^aiA^^) — ^ 



which corresponds to a Weibull distribution. We now show that a likelihood 
for this model based on right censored data is a special case of (28). Suppose 
that Ti, . . . ,Tn\fJ. are i.i.d. random variables with density /(t|A). Then their 
joint density can be expressed as 0^=1 /(^il-^) = nr=i S{Ti)X{Ti). If there are 
additionally T„+i, . . . , random times which are right censored by random 
times -Dn+i) • • • > D^, that is, T/ > Di for I = n + 1, . . . ,m, where we assume 
that the distribution of the censoring times does not depend on /i, then 
the likelihood of p based on n completely observed times and m — n right 
censored times takes the form 



(30) 



n ^(AiA) 



L/=n+l 

where we set D 



l[smX)X{Ti] 



=1 



n5(min(T„A)|A) 



Li=l 



'{x<min{Ti,DO} 



oo for i = l,...,n. Setting Ui{x) 
for i = 1, . . . ,m, one can write 



J i=l 



I{T,>x}I{x<D,} 



n5(min(T„A)|A) 

i=l 



-/i(am) 



where, in this case, gm (y) = 'ET=i /o"'''^^"-^'^ k{x\y) dx. Hence, it is not diffi- 
cult to see that (30) is a special case of (28) with p{gm) = Y^^i ^(™iii(Tj, A)). 



4.2. Posterior analysis of Levy moving averages. We now show how 
Proposition 2.3 is used to obtain the posterior distributional properties of 
the class of Levy moving averages under the multiplicative intensity model. 
Here we actually focus on p. The approach used has similarities to that of 
Lo and Weng [32] in the case of weighted gamma processes. The analysis 
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proceeds, as in the proof of Theorem 3.2, by introducing a suitable augmen- 
tation and then estabhshing the appropriate results for N. First, setting 
fk,m{s,y) = gmiy)s, it follows that N{fk,m) =lJ'{9m)- We now provide some 
notation which will be used in the description of the posterior distribution. 
Throughout we assume, for integers l^m and fixed y, the condition 

(31) Ki{e~^'=-"'p\y)= s^e-^^'-y^' p{ds\y) <oo. 



Define C(X) = Epn;!*!^ /yl^ec, /^(^.iy)]^e, (e-^^-p|y)r?(dy). Additionally, 
for j = 1, . . . ,ra(p), define distributions of the unique jumps Jj,n, each de- 
pending on a corresponding Y*, as 

(32) P(J,. „ e dslY*) = • 

Kej{e p\Y*) 

Using Proposition 2.3 and straightforward algebraic manipulations, that 
is, an appeal to Bayes rule, one arrives at the following description of the 
posterior distribution of p, given X and related quantities. 

Theorem 4.1. Let p{dy) = sN(ds,dy) denote a completely random 
measure on a Polish space y with law determined by the law of the Poisson 
random measure N with mean v{ds,dy) = p{ds\y)rj(dy) on J xy. Suppose 
that X|/i has the multiplicative intensity likelihood specified in (28). Then 
the posterior distribution of p\'X. can be described in terms of the posterior 
distribution o//i|Y,X mixed over the posterior distribution o/Y|X, which 
is described as follows: 

(i) The posterior distribution o/ A^|Y,X is equivalent to the conditional 
law of the random measure Ni^^^{ds, dy) = Nf^ ^ {ds, dy) +J2j=i ^Jj „,Y* {ds, dy), 
where conditional on (J, Y,X), Nf^^ is a Poisson random measure with in- 
tensity 

(33) E[Nf^Jds,dy)] = e~^>''"^^''y^ij{ds,dy) = e'^^-^y^ p{ds\y)7]{dy), 

not depending on (Jj^n)- Additionally, given (Y,X), the {Jj,n) <ire condi- 
tionally independent of Nj^ ^ and are mutually independent with each Jj^n 
having the distribution depending on Y^ specified in (32). 

(ii) Statement (i) implies that ;u|Y,X is equivalent to the conditional 
distribution, given (Y,X), of the random measure 

lhi,m{dy) = / sN*„^{ds,dy) = Pg^{dy) + ^ Jj^nSY*idy), 
JO j=l 

where conditional on Y and X, pg^ (dy) := /q°° sNj^, ^{ds, dy) is a completely 
random measure with Levy measure specified in (33). Additionally, the {Jj,n) 
are conditionally independent of . 
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(iii) If X is a random hazard rate or intensity defined in (25), then its pos- 
terior distribution, given (Y,X), is equivalent to the conditional distribution 
of the random measure 



n(p) 



y 



(iv) The conditional distribution o/ Y|X can be expressed via the condi- 
tional distributions o/Y|p,X and p|X as follows: The distribution o/Y|p,X 
is such that the unique values ofY', Yi , . . . ,Y*i^^y are conditionally inde- 
pendent with distributions 



F{dY*\p,X):=nidY*\Cj)oc 



n HX^\Y*) 



KeAe-^^'-^plYnijidY*). 



(34) 

^(p|X) = [C(X)]-in;i^ /y[n.gc, HX,\y)]>^,^ie~^''-p\yMdy) is the poste- 
rior distribution o/p|X. 



Proof. Similar to the proof of Theorem 3.2, we work with an (aug- 
mented) joint distribution of (X, A^). Removing the integrals in L(X|/i) and 
making appropriate substitutions, it follows that a distribution of (J, Y, N, X) 
is proportional to 



(35) 



\{k{X,\Yi)J, 



li=l 



Y{N{dJi,dYi] 



i=l 



V{dN\v). 



Using the identity nr=i HXi\Y,)Ji = UfIl[U^ec, ^(^*l>^/)]^i,r. combined 
with an application of Proposition 2.3 to (35), shows that the joint distri- 
bution of (J, Y, A^, X) is proportional to 



(36) 



n,(p) 

CN{fk,mW)r{dN\e-f'^''-iy,J,Y) [] 



n HX^\Y*) 



ieCi 



X J-;^e-'"^^^^*^^^'"pidJj,n\Y*)ri{dY* 



where 'P(dA^|e ■^'' ^z^, J, Y) corresponds to the conditional law, given (J,Y,X) 
of the random measure N*„^{ds, dy) = Nf^^^ {ds, dy) + YJ-'f^ ^Jj,n,Y* (ds, dy) 
described in statement (i). The distribution of J|Y,X is then obtained by 
integrating out N in (36) and applying Bayes rule, using the finiteness con- 
dition (31). A similar procedure yields the distributions of Y|X. □ 
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Remark 7. Note that the law of Nfi^ ^ is also determined by first ap- 
plying Proposition 2.1 to (35) to obtain 

e-f'(9^)r{dN\u) = V{dN\e~^^-"'u)jCN{fk,mW)- 

See [20] for a similar type of calculation for spatial NTR processes. Notice 
also that, conditional on (J,Y,X), the dependence of Nf^^ (and N*^^) on 
X is only through the function fk,m- 

Remark 8. The marginal distribution of Y|X can also be written as 



(37) 7r(dY|X) = [C(X)]- 

where 



i=l 



n(p) 



(38) 



M^idYle-f"'^!^) = n Ke,(e"^^*-p|y;)r?(dy/) 



M 



i=l 



V{dN\e~f''-'^v) 



assumes a role analogous to the Blackwell-MacQueen Polya urn distribu- 
tion in Dirichlet process mixture models. This viewpoint becomes important 
when designing computational procedures. 

Remark 9. James [19] gives results for semi-parametric weighted gamma 
process mixture models under more complex multiplicative intensity struc- 
tures, that is, for cases where the kernel k depends on a Euclidean parameter 
/3, and (3 has prior distribution Tr{d(3). A careful examination of that work, 
coupled with the results given here, provides an obvious way to obtain the 
corresponding result for the general processes, via a straightforward appli- 
cation of Bayes rule. A notable wrinkle is that the Laplace functionals will 
depend on /3, and, hence, one does not have the cancellation of the semi- 
parametric version of C-N{fk,m\^)- A discussion of this is omitted for brevity. 
See [16] for further details in the case of the weighted gamma process. 

4.3. Posterior intensity rates and predictive hazards. Similar to the case 
of Dirichlet process mixture models, many posterior quantities can be ex- 
pressed in terms of functionals of the missing values Y or the partition p. 
For example, the posterior intensity rate depends upon the posterior mean 
for /i. From Theorem 4.1, it follows that the posterior mean of ^|X,Y is 
given by 

n,(p) 

(39) E[^il^^{dy)\K,Y] = Ki{e~f'^- p\y)r^{dy) + ^ E[Jj,n\Y;]5Y;{dy), 
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where 



p{ds\Y*) 



-9m{Y*)u 



p{du\Y*) 



The quantity (39) is also the conditional moment measure of Hg^, given 
(Y,X). Using these expressions, we obtain the following generalization of 
Lo and Weng ([32], Theorem 4.2). 

Corollary 4.1. Theorem 4.1 implies that the posterior expectation of 
the intensity (25), given X and Y, is 



E[X{x)\Y,X] = / k{x\y)Ki{e-f'''^-p\y)riidy) + ^ kix\Y*)EiJ,^n\Y*) 



Note, importantly, that a predictive hazard rate is defined as i?[A(X„-|_i)|X]. 

Remark 10. Corollary 4.1 shows that the posterior mean for the inten- 
sity rate can be estimated from Monte Carlo draws involving only p and Y*. 
Thus, in problems where inference focuses on estimating the intensity, there 
is no need to draw values from the posterior of p. From a computational 
perspective this can greatly simplify algorithms. 

4.4. Monte Carlo procedures. Ishwaran and James [16] show that effi- 
cient sampling schemes used to approximate the posterior distributional 
properties of Dirichlet process mixture models can be applied with some 
modification to sample the posterior distribution of mixtures of weighted 
gamma processes in the present setting. A key point was to note the simi- 
larities between the distribution of Y|X for Dirichlet process models relative 
to the Blackwell-MacQueen urn and (37) in the case of the weighted gamma 
process. Lo and Weng [32] and Lo, Brunner and Chan [31] also exploited 
this idea. Here we note that the explicit expression of (38) and its descrip- 
tion in Theorem 4.1, for general processes p, allows one to extend some of 
these procedures. First note that if one wants to sample /i|X, one can ob- 
tain a draw from Y|X and then draw from the distribution of /i* ,„|X,Y 




n(p) 



and, hence, the posterior expectation given X is 
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described in (ii) of Theorem 4.1. Here we give some ideas on how to sam- 
ple from Y|X, noting that steps such as draws from /i|Y,X are natural 
additions. For brevity, we only sketch out some details, focusing on identify- 
ing the relevant probabilities, as one can deduce the operational formalities 
either from [15, 16] or other relevant cited works. Note that (38) is the 
nth moment measure of a completely random measure with Levy measure 
specified in (33). That is, (38) is the nth moment measure of described 
in (ii) of Theorem 4.1. It follows that (38) can also be represented via its 
conditional moment measures [see (39)] as 



n-l 



r=l 



Ki{e ^'''"'p\Yr+i)rj{dYr+i) 

where pr = {Ci^r, ■ ■ ■ , Cn(pr),r} is the partition of {1, . . . , r} encoding the ties 
in the first r observations = (Yi , . ■ . , Y^) and ej^r is the cardinality of Cj,^- 
In order to simulate Y from (37), one can construct an analogue of the Polya 
urn Gibbs sampler of Escobar [11] or sequential importance sampler (SIS) 
of Liu [28] by working with a density constructed from £'[A(a;)]Y,X]. These 
procedures are duals. We first describe the idea for the SIS procedure. This 
procedure samples Yi, . . . , y„ sequentially based on the conditional densities, 
for r = 0, . . . , n — 1, 

(40) ¥{Yr+i G dy\Yr,X) = ^X^dy) + J] Jf^dy^idy), 

where \r{dy) k{Xr+i\y)ni{e^ ^'"•^ p\y)r]{dy) and 

/o,r= / k{Xr+i\y)ni[e-f'''"'p\y)r]{dy) 

Jy 

and 

Ki+e^^(e--^'=.-plY/) 



l,,.{Y;)=k{Xr+l\Y*) —J- — — . 

Kej-,(e J'^'^pyY*) 

Furthermore, Cr = lo,r + J2j=i^ h,r0^f)- The importance weights for this 
scheme are 11"=]^ c,.. Now, for r = 0, ...,n — 1, let Y_(-,,^;^),^ denote the 
collection of n — 1 random variables determined by removing l^+i from 
(Yi, . . . ,Yn). A general analogue of the Polya urn Gibbs sampler for gener- 
ating Yi, . . . ,Yn is implemented by drawing values Y^+i from the probabil- 
ities F{Yr+i G (i?/lY_(,^_|_i) „, X) for r = 0, . . . ,n — 1. These probabilities are 
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defined analogously to (40), where Y_(r+i) plays the role of Y^. See [16] 
for more details in the case of the weighted gamma process. 

As in the case of Dirichlet process mixture models, the SIS and Gibbs 
sampling procedures described above are attractive as one does not need to 
perform complex integration. However, if integration is manageable, then, 
due to a Rao-Blackwellization argument, it is generally better to apply the 
following new variation of the general weighted Chinese restaurant algo- 
rithms discussed in [15, 31]. We will describe an SIS procedure which has a 
dual Gibbs sampling procedure analogous to the collapsed Gibbs samplers. 
The key to the procedure is to generate partitions p based on probabilities 
defined using the predictive hazard rate. That is, for r = . . . , n — 1, define 



where = Jylj^r{y)'^{dy\Cj^r)- The distribution 7r{dy\Cj^r) is the distribu- 
tion for the jth unique value, given Cj^r, defined similarly to (34). The special 
case when r = corresponds to 



By Corollary 4.1 it follows that Z(r) is the predictive hazard rate given 
Xi, . . . ,Xr and p^-. From this, it is possible to define a sequential algorithm 
to generate an importance draw for p from the posterior. The method can 
be described in terms of n customers who enter a restaurant sequentially, 
similar to the class of WCR algorithms. However, now the role played by 
the EPPF for random probability measures in such algorithms is replaced 
by cumulants, k, arising from Levy measures. The first customer is seated 
to a table with probability /(0)//(0) = 1. Now at step r + 1, given a configu- 
ration pr = {Ci^r, ■ • ■ 5 Cn(pr),r} the integers {1, . . . , r}, one determines the 
partition Pr+i by noting whether a customer r + 1 sits at a new table or 
sits at one of the existing tables Cj^r for j = 1, . . . ,n(pr). The seating rule 
is defined as follows. To seat customer r + 1, sit him at an occupied table 
Cj^r with probability Pr(pr+i|pr) = l{r)~^lj^r, where Pr+i = Pr U {r + 1 G 
Cj^r} for j = l,...,n(pr). Otherwise, customer r + 1 sits at a new un- 
occupied table C„(p^+i) with probability Pr(pr+i|pr) = l{r)~^lQ^r, where 
Pr-+i = Pr U C,„(p^_(_i) . After n customers are seated, the algorithm will yield 
a partition p = {Ci, . . . , C„(p)} of {1, . . . , n}. By James ([18], Lemma 2.3), 
this partition has density q{p) satisfying 



n{p) 



Kr) = lo,r + ^. 
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where L{p) = Ylr=i ~ other words, for any integrable function t{p), 

Ept(p)^(p)<z(p) 



^t(p)7r(p|X) 



Ep^(p)9(p) 



Thus, q{p) is an importance density for drawing posterior values p with im- 
portance values L{p). This fact, combined with Theorem 4.1, now suggests 
a method for approximating posterior quantities from the multiplicative in- 
tensity model: 

1. Draw p = {Ci, . . . , C„(p)} from q{p). Condition on p and draw 1^* inde- 
pendently from 7r{dYj*\Cj) for j = 1,. . . ,n(p). 

2. Use the value for Y from step 1 to draw /x from /i|Y,X. That is, draw fj, 

from the random measure fig^ + Jj^n^Y* ■ 

3. To approximate the posterior law of a functional g{^)^ run the previ- 
ous steps B times independently, obtaining values /U^''^ with importance 
weights L(p('')), for b= 1,...,B. Approximate the law V{g{^) € •\^} 
with 

l:tlm^^^'^)^■}L{p^'^) 

EtiLipi^^) 

To approximate functions of the form t(p), for instance, £'[A(t)|X], then 
step 2 can be eliminated and estimation is based on 

EtiHpi^y) ■ 

Remark 1 1 . Note that the main difficulty in step 2 is to approximate a 
draw from fJ-g„^. There are several methods discussed in the literature. See, 
for instance, [4, 6, 42] for some possible ideas and further references in the 
general setting. 

We next present some explicit examples of the posterior distribution of // 
based on the results in Theorem 4.1. 



4.4.1. Generalized gamma process. Brix [6] proposes an interesting class 
of measures by specifying /i to be a generalized gamma random measure. 
Using the description of Brix [6] , these are fi processes with Levy measure 

Pa,b{ds) = — s^'^^^e-'" ds. 
1 (1 — a) 

The values for a and b are restricted to satisfy < a < 1 and < 6 < oo 
or — oo < a < and < 6 < c«. Different choices for a and b in p^^b yield 
various subordinators. These include the stable subordinator when 6 = 0, 
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the gamma process subordinator when a = and the inverse- Gaussian sub- 
ordinator when a = 1/2 and 6 > 0. When a < 0, this results in a class of 
gamma compound Poisson processes. Nieto-Barajas and Walker [34] pro- 
vide analysis for a random distribution function on (0, oo), as in (27), where 
k is an exponential kernel and where is modeled as a weighted version of a 
gamma compound Poisson process. This turns out to be an inhomogeneous 
variation of a subclass of the models of Brix [6] with a = —1 and 5 = b{y) 
in BM^{y). The weighted gamma process considered in [32] corresponds to 
the choice of q = and b = b{y). 

The posterior distribution of /i, given (X, Y), is equivalent to the condi- 
tional distribution of the random measure IJ'g^+J2^=i + 5'm(^/))~^G'j^,i(5y* 

J -J J 

where iig^ is an inhomogeneous generalized gamma process with intensity 

-e-(3™fe)+fe)«s-°-ids?/(dy), 



r(l-a 



and {Gj,n) are independent gctmina random variables with shape 67, n — 
and unit scale. It follows that the conditional moment measure is 



E[^x*(dy)\-K,Y] = {b + gm{y)r-\{dy) 



n(p) 



The joint moment measure of Y can be expressed as 

n(p) 

l[{b + grn{Y;))-^'-"-''^rj{dY* 



M^{dY\pa,b+gn.v) 



-Q r(ej- „, - a) 



, r(i-a) 



which, if 6 = b{y), generalizes an expression for the weighted gamma process; 
see [19, 32]. Note that, for r = 0, 1, . . . , n - 1, 



lo,r= I k{Xr+i\y){b + grr^.{y))°' ^v{dy) 

y 



and 



k(Xr+i\Y*) 

'-^•'^ (TO?!) 

4.4.2. Smoothed spatial beta process. Given the conjugacy properties of 
the beta process when used as a cumulative hazard prior in [14] under right 
censoring, it is natural to think of a smooth version of this process to model 
hazard rates. This is in analogy to smoothing the Nelson-Aalen estimator. 
Here we allow an extension to y = (Yi, I2) £ (0, 00) x by specifying 

pids\yi) = ciyi)s-\l-sr^y'^-Us 
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and writing ri(dyi,dy2), where c is some positive function. Note, however, 
that the posterior behavior is quite different in this context than in [14]. 
The measure fig^ corresponds to a completely random measure with Levy 
measure c(?/i)e~^™^^^*s~^(l — s)'^^^i)~^ dsrj{dyi,dy2) and, hence, is not a beta 
process. Additionally, the distribution of Jj.„ is 

where the normalizing constant depends on the Laplace transform of a beta 
random variable evaluated at QmO^j)- In other words, it is related to the 
confluent hypergeometric function 

iFi(.ej,n,c{Y{j) + ej^n, -9m{Y*)) 



T{c{Yr,))r{ej^n) Jo 



For some simplification, hereafter we set c equal to the constant 9. Then it 
follows that one can write „((iy)|Y,X] as 



L^o 



n(p) 



'1 



ds 



vidy) 



2^ _ \ 



iFi(ej- „ + 1, e + ej> + 1, -Qm{Y;)) 



iFi(e,-,0 + e,-,-5m(l^/)) 
and the joint marginal measure M^((iY|e~-^'=''"z^) is 



^Y;{dy), 



Lj=l 



r(e„„)r(^) 



(ej,n + 0) 



n(p) 

n i^i(ei,n, + e,, „, -g^(Y*))rj{dY* 



5. Proof of Proposition 2.2. In this section we present two results which 
when combined lead to a proof of Proposition 2.2. 



Proposition 5.1. Suppose the (W,A^) are measurable elements in the 
space W" X ^A having the joint measure in (11), where N is a Poisson 
random measure with sigma-finite nonatomic mean measure v. Then the 
following disintegration holds: 



(41) 



U^idWi. 



1=1 



V{dN\v) 



■ V{dN\u,W)v{dWi)\{ 



"(pi-i) 

v{dWi)+ J2 ^w*{dWi] 

i=2 L j=l 
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where 'P{dN\i',W) corresponds to the law of N determined by (15) and is 
representable in distribution as (13). The statement implies that Af ((iW|z/) = 

Proof. First note the equivalence for M{dW\iy) follows by integrating 
out in (41). The result proceeds by induction. The case for n = l, (2), is 
true. Now assuming that the result is true for n = r, it follows that 

"r+l 



i=l 



V{dN\u) = N{dWr+i)V{dN\u,Wr)M{dWr\u), 



which implies the form of M(dWr+iW)^ and, hence, it remains to show that 

n(pr) 

N{dWr+l)V{dN\u,Wr)=V{dN\lJ,Wr+l) u{dWr+l) + ^ 6w*{dWr+l] 

i=i 

First, for functions s and / in BM ^(W), note that, by a change of measure, 
s{w)e"'^^f^ N{dw)V{dN\u, W^) 



(42) 



M JW 



n{pr) 

n e-^^'^/) 



M 



g{K)e-''^f^V{dN\v) 



where g{N*) = s{w)N*{dw) = s{w)N{dw) + ^^^{^ s{W*). Applying 
Proposition 2.1 to the right-hand side of (42) shows that the expressions 
in (42) are equal to 



n(pr) 

n e-^(^/) 



n{pr) 



f f s{w)N{w)VidN\e~^u)+ V s{W^ 



It follows that the conditional Laplace functional of N, given W^+i := 
(\Vr,Wr+i), relative to M((iWr+i jz^), is determined by the expression 



n{pr) 

n e-^(^/) 



w 



n{pr) 

s(l^r+i)e-^(^'-+^)z^(dM^.+i) + 



Now define a function t{Wr+i) to be e if Wr+i is not equal to 

any of the {W^, . . . , W^^(p^)} and is set to be one otherwise. Then, since v 
is nonatomic, it follows that 



s{Wr+l)tiWr+l) 



s{Wr+i)e 



n{pr) 

idWr+i)+ Sw;{dWr+i] 



"(Pr) 
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Hence, the conditional Laplace functional of N, given Wj.+i, with respect 
to MidWr+iliy) is 



(43) CnUW) 
as desired. □ 



■n{pr) 

n 



n e-^(^/) 



t{Wr+l)=CN{fW) 



n 

L j=l 



n e-^(^;) 



The next result, which builds on Proposition 5.1, establishes the partition 
representation of M{dW\i'). 

Proposition 5.2. For i = 1,. . . ,n, let gi be nonnegative functions in 
BM{W). Then 

n(p) 



(44) / n / g,{w,)N{dw,) V{dN\u)=Y, [] / 



n 



Equivalently, M{dV^\v) = n"i^ y{dW* 



Proof. The proof of (44) proceeds by induction. Case n = 1 is obvious. 
Now suppose it is true for n = r. Let Pr+i denote a partition of {1, . . . , r + 1}, 
and define, for each r > 0, 



n{pr) 



Apr) = n / 



n 



v{dw*). 



It follows that (t)g{pr+i) is '?^g(Pr) Syy 9r+i{v)i'{dv) if n(pr+i) = n(pr) + 1. 
Otherwise, if the index r + 1 is in an existing cell/table Ci^r-, then it is 
equivalent to (t>g{Pr) J-y\; gr+i{v)'n'g{dv\Ci^r), where 

for i = 1, . . . , n{pr). Note that this implies that 

n(pr) 

Pr+l Pr 



/ gr+i{v)u{dv) + gr+l{v)TTg{dv\Ci^r) 

Jw ~{ Jw 



Now, by (simple algebra) and the induction hypothesis on r, it follows that 



Pr + l 



„ r „ "(Pr) 

: / / gr+i{vHdv)+ J2 9r+l{W*) 
JW" [Jw ~[ 



i=l 



M{d'Wr\v). 
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Now, utilizing the fact that M{(m r+i\v) = [v{dWr+i) +11^11 ^w* {dWr+i)] x 
M{d'Wr\y) concludes the proof. Note this last statement relies on the result 
in Proposition 5.1. □ 

Remark 12. The proof of Proposition 5.2 follows closely an unpub- 
lished proof by Albert Lo for the case of gamma processes. That is, it is 
an alternative proof for Lemma 2 in [30] which yields the appropriate par- 
tition representation for integrals with respect to a Blackwell-MacQueen 
urn distribution derived from a Dirichlet process. The style of proof exploits 
properties of partitions similar to those stated in [36], Proposition 10. Details 
in the proof of Proposition 5.2 translate into justifications for generalizations 
of weighted Chinese restaurant algorithms. 
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